Headline extraction based on a combination of uni- and multidocument summarization techniques
نویسندگان
چکیده
The TNO system for multi-document summarisation is based on an extraction approach. For headline generation, we chose to extend our system to extract the most informative topical noun phrase. The cluster topic is defined as the most frequent term occurring in the most salient document sentences. The core of our system is a probabilistic model, which estimates the log-odds of salience based on a number of features including sentence position, sentence length, cue phrases and a language model based content score. The parameters of the model were estimated on annotated training data.
منابع مشابه
Multidocument Summarization via Information Extraction
Although recent years has seen increased and successful research efforts in the areas of single -document summarization, multi-document summarization, and information extraction, very few investigations have explored the potential of merging summarization and information extraction techniques. This paper presents and evaluates the initial version of RIPTIDES, a system that combines information ...
متن کاملMultidocument Summarization with GISTexter
This paper presents the architecture and the multidocument summarization techniques implemented in the GISTEXTER system. The paper presents an algorithm for producing incremental multi-document summaries if extraction templates of good quality are available. An empirical method of generating ad-hoc templates that can be populated with information extracted from texts by automatically acquired e...
متن کاملMulti-Document Summarization By Sentence Extraction
This paper discusses a text extraction approach to multidocument summarization that builds on single-document summarization methods by using additional, available in-, formation about the document set as a whole and the relationships between the documents. Multi-document summarization differs from single in that the issues of compression, speed, redundancy and passage selection are critical in ...
متن کاملDetecting Discrepancies in Numeric Estimates Using Multidocument Hypertext Summaries
To aid analysts in detecting discrepancies in numeric estimates in news articles from multiple sources, we propose the automatic generation of hypertext summaries that include a high-level textual overview; tables of all comparable numeric estimates, organized to highlight discrepancies; and targeted access to supporting information from the original articles. The RIPTIDES system, which exempli...
متن کاملAbstractive Multi-document Summarization by Partial Tree Extraction, Recombination and Linearization
Existing work for abstractive multidocument summarization utilise existing phrase structures directly extracted from input documents to generate summary sentences. These methods can suffer from lack of consistence and coherence in merging phrases. We introduce a novel approach for abstractive multidocument summarization through partial dependency tree extraction, recombination and linearization...
متن کامل